Skip to content

feat: session transcripts for LLM KV cache stability#512

Merged
senamakel merged 6 commits intotinyhumansai:mainfrom
senamakel:feat/llm-kv-cache
Apr 12, 2026
Merged

feat: session transcripts for LLM KV cache stability#512
senamakel merged 6 commits intotinyhumansai:mainfrom
senamakel:feat/llm-kv-cache

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 12, 2026

Summary

  • Session transcript persistence: Stores the exact Vec<ChatMessage> sent to the LLM provider as .md files at {workspace}/sessions/DDMMYYYY/{agent}_{index}.md. On session resume, the transcript is read back to produce byte-identical messages, ensuring the inference backend's KV cache prefix is reused rather than re-prefilled.
  • API usage/cache parsing: Parses the openhuman.usage and openhuman.billing metadata from API responses (including cached_input_tokens and charged_amount_usd) into UsageInfo, which was previously always None from the compatible provider.
  • Cache stats in transcript headers: Each session file includes cache hit percentage, token counts, and billing in the HTML comment header for debugging.
  • Sub-agent transcript support: Both typed and fork-mode sub-agents now write their own session transcripts.

Key files

File Change
src/openhuman/agent/harness/session/transcript.rs New — format parser/writer, path resolution, escaping
src/openhuman/agent/harness/session/turn.rs Read hook at turn start, write hook at turn end, cached message usage
src/openhuman/agent/harness/session/types.rs New fields: session_transcript_path, cached_transcript_messages, agent_definition_name
src/openhuman/agent/harness/subagent_runner.rs Transcript persistence for sub-agents
src/openhuman/providers/compatible.rs Parse usage + openhuman blocks from API response
src/openhuman/providers/traits.rs UsageInfo gains cached_input_tokens, charged_amount_usd

Example transcript header

<!-- session_transcript
agent: main
dispatcher: native
cache_boundary: 1847
created: 2026-04-11T14:30:00Z
updated: 2026-04-11T14:35:22Z
turn_count: 3
input_tokens: 5000
output_tokens: 1200
cached_input_tokens: 3500
cache_hit_pct: 70.0%
charged_usd: $0.004500
-->

Test plan

  • cargo check — clean compilation (core + Tauri shell)
  • cargo test — all 2222 tests pass (0 failures)
  • 12 dedicated transcript unit tests: round-trip fidelity, escaping, empty content, whitespace preservation, path resolution, meta parsing with cache stats
  • Manual: run agent session, verify {workspace}/sessions/ directory and transcript content
  • Manual: check cache_hit_pct in transcript header matches expected behavior

Summary by CodeRabbit

  • New Features
    • Agents can carry human-readable definition names for transcript identification.
    • Session transcripts persist to disk, can be resumed across runs, and record per-turn provider messages and aggregated usage.
    • Usage metrics now include cached input tokens and per-request billing amounts for finer cost tracking.
  • Tests
    • Comprehensive unit tests added to validate transcript read/write, escaping, path resolution, and resume behavior.

…e stability

- Added support for session transcript persistence, allowing the exact `Vec<ChatMessage>` to be stored as human-readable `.md` files.
- Introduced methods for loading previous transcripts to pre-populate messages for KV cache reuse, enhancing session continuity.
- Updated `AgentBuilder` to include an agent definition name for transcript file naming.
- Refactored `Agent` to handle transcript loading and persistence during session turns, ensuring byte-identical message handling across sessions.
- Created a new `transcript.rs` module to encapsulate transcript-related functionality, improving code organization and maintainability.
…tadata

- Added `usage` and `openhuman` fields to the `ApiChatResponse` struct to capture standard OpenAI usage metrics and OpenHuman backend metadata.
- Introduced `ApiUsage`, `OpenHumanMeta`, and related structs to encapsulate detailed usage and billing information.
- Updated `parse_native_response` and `extract_usage` methods to process and return usage data, improving the integration of usage tracking in chat responses.
- Modified the `UsageInfo` struct in traits to include new fields for cached input tokens and charged amount, enhancing the overall usage reporting capabilities.
…der responses

- Added usage extraction functionality to the `OpenAiCompatibleProvider`, enhancing the API response structure to include usage metrics.
- Updated the `parse_native_response` method to utilize a new `wrap_message` function for improved testability and clarity in response handling.
- Refactored tests to accommodate the new usage extraction logic, ensuring comprehensive coverage of the updated response structure.
…e tracking

- Added functionality to persist sub-agent conversation transcripts, enabling inspection for debugging and KV cache analysis.
- Introduced a new `persist_subagent_transcript` function to handle transcript writing, including metadata for input/output tokens and charged amounts.
- Updated the `Agent` implementation to accumulate usage statistics across iterations and include them in the session transcript.
- Enhanced the `write_transcript` function to include additional metadata fields, improving the detail of stored transcripts.
- Refactored tests to validate the new transcript persistence and usage tracking features, ensuring comprehensive coverage.
- Reformatted the `write_transcript` and `read_transcript` functions for better clarity by adjusting line breaks and indentation.
- Enhanced the `parse_meta` function to improve readability by restructuring the `get` closure.
- Updated test cases to maintain consistency with the new formatting, ensuring clarity in the test structure.
- Overall, these changes aim to enhance code maintainability and readability without altering functionality.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Adds session transcript persistence and resume: new transcript module for read/write/find/resolve, agent-level fields and builder setter for agent name, turn/subagent code to load and persist transcripts and aggregate usage, and provider UsageInfo extended with cached tokens and billing fields.

Changes

Cohort / File(s) Summary
Session Transcript Module
src/openhuman/agent/harness/session/transcript.rs, src/openhuman/agent/harness/session/mod.rs
New crate-visible transcript module implementing MD-based transcript read/write, resolve_new_transcript_path, find_latest_transcript, metadata types, escaping semantics, and comprehensive unit tests.
Agent Types & Builder
src/openhuman/agent/harness/session/types.rs, src/openhuman/agent/harness/session/builder.rs
Added agent_definition_name, session_transcript_path, cached_transcript_messages to Agent; added AgentBuilder::agent_definition_name(...) setter; builder defaults agent_definition_name to "main" when unset and initializes transcript path/cached messages to None.
Session Turn Logic
src/openhuman/agent/harness/session/turn.rs
On turn start, attempt resume by loading latest transcript when history empty; support one-time cached prefix consumption and delta-appending for provider messages; accumulate per-turn usage and persist session transcript after successful turns; added try_load_session_transcript() and persist_session_transcript().
Subagent Runner Persistence
src/openhuman/agent/harness/subagent_runner.rs
Inner loop now returns aggregated usage; typed/fork modes call persist_subagent_transcript() to write transcripts (best-effort) including cache boundary, token counts, and charged amount.
Provider Usage & Parsing
src/openhuman/providers/traits.rs, src/openhuman/providers/compatible.rs
Added UsageInfo.cached_input_tokens and charged_amount_usd fields; extended ApiChatResponse parsing to include OpenHuman usage metadata; introduced extract_usage() and updated parse_native_response to return usage-aware ProviderChatResponse.
Test Initializers (no API change)
src/openhuman/context/guard.rs, src/openhuman/context/manager.rs, src/openhuman/context/pipeline.rs
Unit tests updated to use ..Default::default() for UsageInfo literals to fill unspecified fields.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent
    participant Transcript as Transcript Module
    participant FS as Filesystem
    participant Provider as Provider

    Note over Agent,Transcript: Turn start (resume attempt)
    Agent->>Agent: if history empty && cached_transcript_messages.is_none()
    Agent->>Transcript: find_latest_transcript(workspace, agent_name)
    Transcript->>FS: scan today's and yesterday's session dirs
    FS-->>Transcript: return latest path?
    alt path found
        Agent->>Transcript: read_transcript(path)
        Transcript->>FS: read & parse .md into messages + meta
        FS-->>Transcript: SessionTranscript
        Transcript-->>Agent: provide messages + meta
        Agent->>Agent: set cached_transcript_messages & cache_boundary
    else no path
        Transcript-->>Agent: none
    end

    Note over Agent,Provider: Provider interaction loop
    loop per provider iteration
        alt resumed & first iteration
            Agent->>Agent: consume cached prefix, append history delta
        else
            Agent->>Agent: build messages from full history
        end
        Agent->>Provider: send messages
        Provider-->>Agent: response + usage
        Agent->>Agent: accumulate usage & capture last-sent messages
    end

    Note over Agent,Transcript: Turn completion
    alt turn succeeded
        Agent->>Transcript: persist_session_transcript(path?, messages, aggregated_usage, meta)
        Transcript->>FS: create date dirs & write .md with metadata + <!--MSG--> blocks
        FS-->>Transcript: write success/failure
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰
With whiskers poised by midnight's pane,
I saved our chats in markdown grain,
Cached prefixes tucked in rows so neat,
Resume, persist—then hop a beat.
Hooray, small rabbit, tracks complete!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: session transcripts for LLM KV cache stability, which directly relates to the primary objective of enabling KV cache reuse through byte-identical message persistence.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/openhuman/providers/compatible.rs (1)

1066-1099: Add debug trace for usage-source selection in extract_usage.

This branch is critical for token accounting and billing diagnostics; please add a structured debug log indicating whether values came from openhuman or standard usage and whether billing metadata was present (without sensitive payload logging).
As per coding guidelines: src/**/*.rs: Add substantial, development-oriented logs on new/changed flows; include logs at branch decisions and error handling, with structured stable prefixes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/providers/compatible.rs` around lines 1066 - 1099, In
extract_usage, add a structured debug trace that logs which source provided
token counts (openhuman vs standard usage) and whether billing metadata was
present: inspect oh, std_usage, oh_usage, and oh_billing and emit a
development-level logger call (with a stable prefix) that records booleans like
from_openhuman, from_standard, and has_billing and the numeric token values
(input_tokens, output_tokens, cached_input_tokens) but no sensitive payloads;
ensure the log is placed after computing the token variables and before
returning the ProviderUsageInfo so the instrumentation reflects the branch
decision used to populate ProviderUsageInfo.
src/openhuman/agent/harness/subagent_runner.rs (1)

432-444: Sub-agent transcripts lack usage metrics.

The token and cost fields are hardcoded to zero, whereas the main agent's turn.rs accumulates actual usage from provider responses. For consistency and debugging value, consider threading usage info from run_inner_loop to the transcript. Currently run_inner_loop discards the ChatResponse.usage data.

This is acceptable for the initial implementation since transcripts are primarily for KV cache prefix replay, and sub-agent usage is typically a small fraction of total cost.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 432 - 444,
TranscriptMeta is being populated with zeroed usage and cost fields; update the
pipeline to thread actual usage from run_inner_loop into the transcript: modify
run_inner_loop (and its callers) to return or expose the ChatResponse.usage (or
an aggregated Usage struct) instead of discarding it, sum input/output/token
usage and charged amount across sub-agent provider responses, and then set
TranscriptMeta.input_tokens, output_tokens, cached_input_tokens (if applicable),
and charged_amount_usd from that aggregated Usage when constructing
TranscriptMeta in subagent_runner.rs so transcripts reflect real usage for
debugging and accounting.
src/openhuman/agent/harness/session/transcript.rs (1)

207-213: Edge case in escape/unescape.

If content contains the literal escape sequence <!--\/MSG-->, it will be corrupted to <!--/MSG--> on round-trip. This is extremely unlikely in practice (LLM conversations won't contain this exact byte sequence), but worth documenting in a code comment for future maintainers.

📝 Suggested documentation
 fn escape_content(content: &str) -> String {
+    // Note: if content contains the literal escape sequence `<!--\/MSG-->`,
+    // it will be incorrectly unescaped on read. This is an accepted limitation
+    // for an extremely unlikely edge case.
     content.replace(MSG_CLOSE, MSG_CLOSE_ESCAPED)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 207 - 213,
The current escape_content and unescape_content functions perform a simple
replace between MSG_CLOSE and MSG_CLOSE_ESCAPED but are not fully reversible if
the original content already contains the literal escape sequence
(<!--\/MSG-->), which will be transformed on round-trip; add a concise code
comment above the functions escape_content and unescape_content documenting this
edge case (mention the exact sequences MSG_CLOSE and MSG_CLOSE_ESCAPED and that
an input containing MSG_CLOSE_ESCAPED will be corrupted on unescape), and
optionally note that a more robust fix would be to use a non-ambiguous encoding
or a two-pass escaping scheme if full reversibility is later required.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/providers/compatible.rs`:
- Around line 1000-1008: The helper parse_native_response currently returns a
successful ProviderChatResponse with text None when api_response.choices is
empty; change parse_native_response to return a Result<ProviderChatResponse,
Error> (or existing error type) and propagate an Err when
choices.into_iter().next() is None so empty choices are treated as an error;
update callers (e.g., chat() and the other occurrence around
parse_native_response usage at the second site) to handle the Result (propagate
or map the error) to keep the contract consistent and avoid silently accepting
invalid payloads.
- Around line 333-343: OpenHumanUsage currently defines non-optional u64 token
fields (input_tokens/output_tokens/total_tokens/cached_input_tokens) with
#[serde(default)] which deserializes missing fields to 0 and causes
oh_usage.map(|u| u.input_tokens) to return Some(0) and block fallback; change
those fields to Option<u64> (e.g., input_tokens: Option<u64>) and update
extract_usage to use oh_usage.and_then(|u| u.input_tokens) / and_then for each
token so missing keys yield None and allow the fallback chain to proceed; also
add a debug/trace log in extract_usage noting which branch provided the usage
(OpenHuman vs standard source) when making the branch decision so flow changes
are observable.

---

Nitpick comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 207-213: The current escape_content and unescape_content functions
perform a simple replace between MSG_CLOSE and MSG_CLOSE_ESCAPED but are not
fully reversible if the original content already contains the literal escape
sequence (<!--\/MSG-->), which will be transformed on round-trip; add a concise
code comment above the functions escape_content and unescape_content documenting
this edge case (mention the exact sequences MSG_CLOSE and MSG_CLOSE_ESCAPED and
that an input containing MSG_CLOSE_ESCAPED will be corrupted on unescape), and
optionally note that a more robust fix would be to use a non-ambiguous encoding
or a two-pass escaping scheme if full reversibility is later required.

In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 432-444: TranscriptMeta is being populated with zeroed usage and
cost fields; update the pipeline to thread actual usage from run_inner_loop into
the transcript: modify run_inner_loop (and its callers) to return or expose the
ChatResponse.usage (or an aggregated Usage struct) instead of discarding it, sum
input/output/token usage and charged amount across sub-agent provider responses,
and then set TranscriptMeta.input_tokens, output_tokens, cached_input_tokens (if
applicable), and charged_amount_usd from that aggregated Usage when constructing
TranscriptMeta in subagent_runner.rs so transcripts reflect real usage for
debugging and accounting.

In `@src/openhuman/providers/compatible.rs`:
- Around line 1066-1099: In extract_usage, add a structured debug trace that
logs which source provided token counts (openhuman vs standard usage) and
whether billing metadata was present: inspect oh, std_usage, oh_usage, and
oh_billing and emit a development-level logger call (with a stable prefix) that
records booleans like from_openhuman, from_standard, and has_billing and the
numeric token values (input_tokens, output_tokens, cached_input_tokens) but no
sensitive payloads; ensure the log is placed after computing the token variables
and before returning the ProviderUsageInfo so the instrumentation reflects the
branch decision used to populate ProviderUsageInfo.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f588128-4c71-45d8-8ba5-4b01eac28a71

📥 Commits

Reviewing files that changed from the base of the PR and between 1cb70e3 and 3b1ed77.

📒 Files selected for processing (11)
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/mod.rs
  • src/openhuman/agent/harness/session/transcript.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/harness/subagent_runner.rs
  • src/openhuman/context/guard.rs
  • src/openhuman/context/manager.rs
  • src/openhuman/context/pipeline.rs
  • src/openhuman/providers/compatible.rs
  • src/openhuman/providers/traits.rs

…64> for OpenHumanUsage fields

- parse_native_response now returns Result and propagates an error when
  choices is empty instead of silently returning text: None
- OpenHumanUsage token fields changed to Option<u64> so missing keys
  yield None and allow fallback to standard OpenAI usage block
- extract_usage now logs which source (openhuman vs standard) provided
  token counts via structured tracing
- run_inner_loop returns AggregatedUsage so subagent transcripts reflect
  real usage instead of zeros
- Document escape_content/unescape_content edge case with pre-existing
  escape sequences
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 207-223: The current escape_content/unescape_content pair is not
bijective because a pre-existing literal MSG_CLOSE_ESCAPED becomes MSG_CLOSE
during unescape; fix by making the scheme reversible: in escape_content
(function escape_content) first escape backslashes (e.g., replace "\" with an
escaped marker) and then escape MSG_CLOSE (MSG_CLOSE -> MSG_CLOSE_ESCAPED), and
in unescape_content perform the exact reverse order (first unescape
MSG_CLOSE_ESCAPED -> MSG_CLOSE, then unescape the backslash marker -> "\"), or
alternatively switch to a length-delimited encoding (prefix content with its
byte length) inside escape_content and decode accordingly in unescape_content;
update references to MSG_CLOSE and MSG_CLOSE_ESCAPED and ensure both functions
are exact inverses so every input round-trips exactly.
- Around line 166-175: resolve_new_transcript_path currently races by returning
a suggested filename (max+1) without reserving it; change the allocation to
atomically reserve a new transcript file by attempting to create the file with
O_EXCL style semantics in a retry loop: in resolve_new_transcript_path (or a new
helper e.g., reserve_new_transcript_path) compute sanitized name and index, then
try to open/create the candidate path using create_new
(OpenOptions::new().write(true).create_new(true)) or equivalent, and if
create_new fails because the file exists, increment index and retry until
success (with a reasonable cap/backoff); return the successfully created PathBuf
so the later fs::write in subagent_runner (calls at lines referenced) can safely
write without clobbering another concurrent writer. Ensure errors propagate via
Result and clean up partially created files on failure if needed.
- Around line 268-320: The parser currently treats malformed message tags as
end-of-input by using `break`, causing truncated files to load silently; in
`parse_messages()` replace those `break` branches (specifically the `else {
break; }` after failing to find `MSG_OPEN_SUFFIX`/role, and the final
double-`find` that falls back to `break` when `MSG_CLOSE` isn't found) with hard
errors (e.g., `anyhow::bail!` or `Err(anyhow!(...))`) that include context (the
current `search_from`/content_start index, the expected token like
`MSG_OPEN_PREFIX`/`MSG_OPEN_SUFFIX`/`MSG_CLOSE`, and a short snippet of the
remaining `raw`), so callers get an explicit parse error instead of a partial
`Ok(messages)` return; keep the successful message pushes using
`unescape_content` and `ChatMessage` unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d0eda17-61b5-4a1b-81b8-da8769149049

📥 Commits

Reviewing files that changed from the base of the PR and between 3b1ed77 and 1011fa6.

📒 Files selected for processing (3)
  • src/openhuman/agent/harness/session/transcript.rs
  • src/openhuman/agent/harness/subagent_runner.rs
  • src/openhuman/providers/compatible.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/providers/compatible.rs

Comment on lines +166 to +175
pub fn resolve_new_transcript_path(workspace_dir: &Path, agent_name: &str) -> Result<PathBuf> {
let date_dir = today_session_dir(workspace_dir);
fs::create_dir_all(&date_dir)
.with_context(|| format!("create session dir {}", date_dir.display()))?;

let sanitized = sanitize_agent_name(agent_name);
let next_index = next_index(&date_dir, &sanitized)?;
let filename = format!("{}_{}.md", sanitized, next_index);

Ok(date_dir.join(filename))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Path allocation can collide under concurrent writes.

resolve_new_transcript_path() computes max + 1 and returns the filename without reserving it. The new sub-agent call sites in src/openhuman/agent/harness/subagent_runner.rs at Lines 302-308 and 392-398 can hit this concurrently for the same agent_id, choose the same {agent}_{index}.md, and let the later fs::write overwrite the first transcript. This needs an atomic reservation step (create_new + retry loop, or equivalent) in the allocation/write path.

Also applies to: 339-356

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 166 - 175,
resolve_new_transcript_path currently races by returning a suggested filename
(max+1) without reserving it; change the allocation to atomically reserve a new
transcript file by attempting to create the file with O_EXCL style semantics in
a retry loop: in resolve_new_transcript_path (or a new helper e.g.,
reserve_new_transcript_path) compute sanitized name and index, then try to
open/create the candidate path using create_new
(OpenOptions::new().write(true).create_new(true)) or equivalent, and if
create_new fails because the file exists, increment index and retry until
success (with a reasonable cap/backoff); return the successfully created PathBuf
so the later fs::write in subagent_runner (calls at lines referenced) can safely
write without clobbering another concurrent writer. Ensure errors propagate via
Result and clean up partially created files on failure if needed.

Comment on lines +207 to +223
/// Escape the closing delimiter so it cannot appear literally in message
/// content. Replaces `<!--/MSG-->` with `<!--\/MSG-->`.
///
/// **Known edge case:** if the original content already contains the literal
/// escape sequence `<!--\/MSG-->`, `unescape_content` will convert it into
/// `<!--/MSG-->`, corrupting the round-trip. In practice this sequence is
/// vanishingly unlikely in LLM output. A fully reversible fix would require
/// a two-pass escaping scheme (e.g. also escaping the backslash), but that
/// added complexity is not warranted unless the edge case materialises.
fn escape_content(content: &str) -> String {
content.replace(MSG_CLOSE, MSG_CLOSE_ESCAPED)
}

/// Reverse the escaping applied by [`escape_content`]. See that function's
/// doc comment for the known edge case with pre-existing escape sequences.
fn unescape_content(content: &str) -> String {
content.replace(MSG_CLOSE_ESCAPED, MSG_CLOSE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the content escaping bijective.

This escape/unescape pair still cannot round-trip every valid ChatMessage.content: a literal <!--\/MSG--> is read back as <!--/MSG-->. That breaks the core “exact bytes back out” guarantee for any prompt or tool output that already contains the escaped token. Please switch to a fully reversible encoding here (for example, a length-delimited body or another bijective escaping scheme).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 207 - 223,
The current escape_content/unescape_content pair is not bijective because a
pre-existing literal MSG_CLOSE_ESCAPED becomes MSG_CLOSE during unescape; fix by
making the scheme reversible: in escape_content (function escape_content) first
escape backslashes (e.g., replace "\" with an escaped marker) and then escape
MSG_CLOSE (MSG_CLOSE -> MSG_CLOSE_ESCAPED), and in unescape_content perform the
exact reverse order (first unescape MSG_CLOSE_ESCAPED -> MSG_CLOSE, then
unescape the backslash marker -> "\"), or alternatively switch to a
length-delimited encoding (prefix content with its byte length) inside
escape_content and decode accordingly in unescape_content; update references to
MSG_CLOSE and MSG_CLOSE_ESCAPED and ensure both functions are exact inverses so
every input round-trips exactly.

Comment on lines +268 to +320
fn parse_messages(raw: &str) -> Result<Vec<ChatMessage>> {
let mut messages = Vec::new();
let mut search_from = 0;

loop {
// Find next <!--MSG role="..."--> opening tag
let Some(open_start) = raw[search_from..].find(MSG_OPEN_PREFIX) else {
break;
};
let open_start = search_from + open_start;
let after_prefix = open_start + MSG_OPEN_PREFIX.len();

// Extract role from between the quotes
let Some(role_end) = raw[after_prefix..].find(MSG_OPEN_SUFFIX) else {
break;
};
let role = raw[after_prefix..after_prefix + role_end].to_string();

// Content starts after the opening tag + newline
let content_start = after_prefix + role_end + MSG_OPEN_SUFFIX.len();
let content_start = if raw[content_start..].starts_with('\n') {
content_start + 1
} else {
content_start
};

// Find closing tag
let close_tag = format!("\n{MSG_CLOSE}");
let Some(content_end_rel) = raw[content_start..].find(&close_tag) else {
// Try without leading newline for empty content
let Some(content_end_rel) = raw[content_start..].find(MSG_CLOSE) else {
break;
};
let content = &raw[content_start..content_start + content_end_rel];
messages.push(ChatMessage {
role,
content: unescape_content(content),
});
search_from = content_start + content_end_rel + MSG_CLOSE.len();
continue;
};

let content = &raw[content_start..content_start + content_end_rel];
messages.push(ChatMessage {
role,
content: unescape_content(content),
});

search_from = content_start + content_end_rel + close_tag.len();
}

Ok(messages)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Malformed transcripts currently load as partial history.

The error branches in parse_messages() break and then return Ok(messages), so a truncated or hand-edited file is treated as a valid-but-shorter conversation. For resume/cache reuse that is worse than failing fast, because it silently changes the prefix sent back to the model. Please turn these malformed-tag paths into hard parse errors with context.

Suggested direction
-        let Some(role_end) = raw[after_prefix..].find(MSG_OPEN_SUFFIX) else {
-            break;
-        };
+        let Some(role_end) = raw[after_prefix..].find(MSG_OPEN_SUFFIX) else {
+            anyhow::bail!(
+                "malformed transcript: unterminated role header at byte {}",
+                open_start
+            );
+        };
...
-        let Some(content_end_rel) = raw[content_start..].find(&close_tag) else {
+        let Some(content_end_rel) = raw[content_start..].find(&close_tag) else {
             // Try without leading newline for empty content
-            let Some(content_end_rel) = raw[content_start..].find(MSG_CLOSE) else {
-                break;
-            };
+            let Some(content_end_rel) = raw[content_start..].find(MSG_CLOSE) else {
+                anyhow::bail!(
+                    "malformed transcript: missing closing tag for role '{}'",
+                    role
+                );
+            };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn parse_messages(raw: &str) -> Result<Vec<ChatMessage>> {
let mut messages = Vec::new();
let mut search_from = 0;
loop {
// Find next <!--MSG role="..."--> opening tag
let Some(open_start) = raw[search_from..].find(MSG_OPEN_PREFIX) else {
break;
};
let open_start = search_from + open_start;
let after_prefix = open_start + MSG_OPEN_PREFIX.len();
// Extract role from between the quotes
let Some(role_end) = raw[after_prefix..].find(MSG_OPEN_SUFFIX) else {
break;
};
let role = raw[after_prefix..after_prefix + role_end].to_string();
// Content starts after the opening tag + newline
let content_start = after_prefix + role_end + MSG_OPEN_SUFFIX.len();
let content_start = if raw[content_start..].starts_with('\n') {
content_start + 1
} else {
content_start
};
// Find closing tag
let close_tag = format!("\n{MSG_CLOSE}");
let Some(content_end_rel) = raw[content_start..].find(&close_tag) else {
// Try without leading newline for empty content
let Some(content_end_rel) = raw[content_start..].find(MSG_CLOSE) else {
break;
};
let content = &raw[content_start..content_start + content_end_rel];
messages.push(ChatMessage {
role,
content: unescape_content(content),
});
search_from = content_start + content_end_rel + MSG_CLOSE.len();
continue;
};
let content = &raw[content_start..content_start + content_end_rel];
messages.push(ChatMessage {
role,
content: unescape_content(content),
});
search_from = content_start + content_end_rel + close_tag.len();
}
Ok(messages)
}
fn parse_messages(raw: &str) -> Result<Vec<ChatMessage>> {
let mut messages = Vec::new();
let mut search_from = 0;
loop {
// Find next <!--MSG role="..."--> opening tag
let Some(open_start) = raw[search_from..].find(MSG_OPEN_PREFIX) else {
break;
};
let open_start = search_from + open_start;
let after_prefix = open_start + MSG_OPEN_PREFIX.len();
// Extract role from between the quotes
let Some(role_end) = raw[after_prefix..].find(MSG_OPEN_SUFFIX) else {
anyhow::bail!(
"malformed transcript: unterminated role header at byte {}",
open_start
);
};
let role = raw[after_prefix..after_prefix + role_end].to_string();
// Content starts after the opening tag + newline
let content_start = after_prefix + role_end + MSG_OPEN_SUFFIX.len();
let content_start = if raw[content_start..].starts_with('\n') {
content_start + 1
} else {
content_start
};
// Find closing tag
let close_tag = format!("\n{MSG_CLOSE}");
let Some(content_end_rel) = raw[content_start..].find(&close_tag) else {
// Try without leading newline for empty content
let Some(content_end_rel) = raw[content_start..].find(MSG_CLOSE) else {
anyhow::bail!(
"malformed transcript: missing closing tag for role '{}'",
role
);
};
let content = &raw[content_start..content_start + content_end_rel];
messages.push(ChatMessage {
role,
content: unescape_content(content),
});
search_from = content_start + content_end_rel + MSG_CLOSE.len();
continue;
};
let content = &raw[content_start..content_start + content_end_rel];
messages.push(ChatMessage {
role,
content: unescape_content(content),
});
search_from = content_start + content_end_rel + close_tag.len();
}
Ok(messages)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 268 - 320,
The parser currently treats malformed message tags as end-of-input by using
`break`, causing truncated files to load silently; in `parse_messages()` replace
those `break` branches (specifically the `else { break; }` after failing to find
`MSG_OPEN_SUFFIX`/role, and the final double-`find` that falls back to `break`
when `MSG_CLOSE` isn't found) with hard errors (e.g., `anyhow::bail!` or
`Err(anyhow!(...))`) that include context (the current
`search_from`/content_start index, the expected token like
`MSG_OPEN_PREFIX`/`MSG_OPEN_SUFFIX`/`MSG_CLOSE`, and a short snippet of the
remaining `raw`), so callers get an explicit parse error instead of a partial
`Ok(messages)` return; keep the successful message pushes using
`unescape_content` and `ChatMessage` unchanged.

@senamakel senamakel merged commit ace0006 into tinyhumansai:main Apr 12, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant